Goto

Collaborating Authors

 deep regularization


Proximal Mapping for Deep Regularization

Neural Information Processing Systems

Underpinning the success of deep learning is effective regularizations that allow a variety of priors in data to be modeled.


Review for NeurIPS paper: Proximal Mapping for Deep Regularization

Neural Information Processing Systems

It concretely proposes and experimentally evaluates two such regularizers: One that encourages an LSTM to be robust to changes in its inputs, and one that regularizes embeddings to be close to each other in multi-view learning. The reviewers all agreed that the method was clearly presented, well-motivated, and an important contribution. The consensus is therefore to accept.


Review for NeurIPS paper: Proximal Mapping for Deep Regularization

Neural Information Processing Systems

Summary and Contributions: This work proposes the use of proximal mapping to introduce certain data-dependent regularizers on neural network activations. The authors introduce two different regularization methods based on this idea. The first is a regularization on outputs of a recurrent network (LSTM) to encourage robustness to perturbations in the input. This regularizer has a closed form solution, though second order derivatives are required. The second regularization method introduced controls correlation between activations of hidden layers on two different data sets, similar to deep CCA (DCCA).


Proximal Mapping for Deep Regularization

Neural Information Processing Systems

Underpinning the success of deep learning is effective regularizations that allow a variety of priors in data to be modeled. However, most regularizers are specified in terms of hidden layer outputs, which are not themselves optimization variables. In contrast to prevalent methods that optimize them indirectly through model weights, we propose inserting proximal mapping as a new layer to the deep network, which directly and explicitly produces well regularized hidden layer outputs. The resulting technique is shown well connected to kernel warping and dropout, and novel algorithms were developed for robust temporal learning and multiview modeling, both outperforming state-of-the-art methods.


Batch-less stochastic gradient descent for compressive learning of deep regularization for image denoising

Shi, Hui, Traonmilin, Yann, Aujol, J-F

arXiv.org Artificial Intelligence

We consider the problem of denoising with the help of prior information taken from a database of clean signals or images. Denoising with variational methods is very efficient if a regularizer well adapted to the nature of the data is available. Thanks to the maximum a posteriori Bayesian framework, such regularizer can be systematically linked with the distribution of the data. With deep neural networks (DNN), complex distributions can be recovered from a large training database.To reduce the computational burden of this task, we adapt the compressive learning framework to the learning of regularizers parametrized by DNN. We propose two variants of stochastic gradient descent (SGD) for the recovery of deep regularization parameters from a heavily compressed database. These algorithms outperform the initially proposed method that was limited to low-dimensional signals, each iteration using information from the whole database. They also benefit from classical SGD convergence guarantees. Thanks to these improvements we show that this method can be applied for patch based image denoising.}